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Abstract — This paper develops the sufficiency principle suit- 
able for data reduction in decentralized inference systems. Both 
parallel and tandem networks are studied and we focus on the 
cases where observations at decentralized nodes are conditionally 
dependent. For a parallel network, through the introduction of a 
hidden variable that induces conditional independence among the 
observations, the locally sufficient statistics, defined with respect 
to the hidden variable, are shown to be globally sufficient for 
the parameter of inference interest. For a tandem network, the 
notion of conditional sufficiency is introduced and the related 
theories and tools are developed. Finally, connections between the 
sufficiency principle and some distributed source coding problems 
are explored. 

I. Introduction 

The sufficiency principle has played a prominent role in 
designing data processing methods for statistical inference. A 
sufficient statistic is a function of the data that contains all the 
information in the data about the parameter of interest. The 
primary goal of sufficiency-based data reduction is dimension- 
ality reduction to facilitate subsequent inferences based on the 
reduced data [l]-[3]. 

Suppose 9 is the parameter of inference interest and X = 
{Xi, ■ ■ ■ ,X n } is a vector of random variables, whose distri- 
bution is given by p(x|6»fl If T(X) is a sufficient statistic 
for 9, then any inference about 9 should depend on X only 
through T(X) [2], A useful tool to identify sufficient statistics 
is the Neyman-Fisher factorization theorem [2, Theorem 6.2.6] 
which states that a statistic T(X) is sufficient for 9 if and only 
if there exist functions g(t\9) and h(x) such that 

p(x|0)= ff (T(x)|0)&(x). 

If the parameter 9 is itself random, the sufficiency principle 
can also be reframed using the data processing inequality [4, 
Section 2.9]. That is, a function T(X) is a sufficient statistic 
if and only if the following Markov chain holds: 

0-T(X)-X. 

For decentralized inference, data reduction is done locally 
without access to the global data. Therefore, the contrasting 
notions of local sufficiency and global sufficiency [5] need 
to be treated with care. A sufficient statistic that is defined 
with respect to local data is referred to as locally sufficient 
statistic while a sufficient statistic defined with respect to 
the global data in the network is referred to as a globally 

'We do not distinguish between probability density and probability mass 
function. Its meaning will become clear in the context of specific problems. 




Fig. 1. Parallel network. 
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Fig. 2. Tandem network. 



sufficient statistic [5], As such, whether a statistic at a local 
node is globally sufficient is not determined solely by the 
statistical characterization of local data but also depends on 
the joint distribution of the whole data and how data/statistics 
are passed along within the network. 

For conditionally independent observations (e.g., X and Y 
are independent given 9 in Figs. Q] and |2), local sufficiency 
implies global sufficiency. This result was established in [5]- 
[7] for parallel networks (Fig. 1) and it is straightforward 
to show that the same result holds for tandem networks 
(Fig. 2). An interesting manifestation of the above result is 
in decentralized detection. It is well known that for a binary 
hypothesis testing problem, the likelihood ratio (LR) is a 
sufficient statistic for the underlying hypothesis. Therefore, it 
is not surprising that likelihood ratio quantizers are globally 
optimal for decentralized detection with conditionally indepen- 
dent observations [8], even with non-ideal, possibly coupling 
channels between the sensors and the fusion center [9], [10]. 

Without the conditional independence assumption, decen- 
tralized inference becomes considerably more complex. For 
the decentralized detection, the optimal solution becomes NP 
complete when the observations are conditionally dependent 
[11]. The primary focus of this paper is to develop theories 
and tools for decentralized data deduction with conditionally 
dependent observations for both parallel and tandem networks. 

For parallel networks, we investigate the sufficiency prin- 
ciple under a hierarchical conditional independence (HCI) 
model, which is a new framework recently proposed to deal 



with distributed detection with conditionally dependent obser- 
vations [12]. The main idea is to inject a hidden variable W 
such that the sensor observations are conditionally independent 
with respect to this new variable regardless of the dependence 
structure of the original model. Suitable conditions are identi- 
fied under this HCI model such that local sufficiency implies 
global sufficiency. 

For tandem networks such as that described in Fig. |2j Y is 
fully available at the decision node. As such, the novel notion 
of conditional sufficiency is defined to capture the difference in 
network structure with that of the parallel network. A new set 
of theories and tools corresponding to conditional sufficiency 
are then developed. 

Finally, the developed notion of sufficiency is applied to 
some classical distributed source coding problems. There, 
sufficiency-based data reduction prior to a source encoder is 
shown to incur no penalty on the corresponding rate region or 
the rate distortion function. 

The rest of the paper is organized as follows. Section [TT] 
develops the sufficiency principle in parallel networks with 
conditionally dependent observations. Section [Til] deals with 
tandem networks where the notion of conditional sufficiency 
is introduced and associated theories are developed. In section 
IIV1 the connection between the developed sufficiency princi- 
ple and two distributed source coding problems is explored. 
Section PVl concludes the paper. 

II. Sufficiency for Parallel Network 

This section considers only a parallel network of two 
sensors as illustrated in Fig. Q] The result extends naturally to 
the case with arbitrary numbers of sensors. Let data available 
at node X be X while data available at node Y be Y. 

Assume the parameter 8 is random. (T^X), T y (Y)) 
are globally sufficient for 8 if the Markov chain 8 — 
(T S (X) ) T W (Y))-(X,Y) holds. 

Identifying local statistics that are globally sufficient can 
be accomplished in theory via the factorization theorem. 
The process of using the factorization theorem may become 
cumbersome in a decentralized system or not applicable when 
the precise joint distribution of the data in the network is not 
available at local nodes. The following theorem provides cer- 
tain relation between local sufficiency and global sufficiency 
for a class of distributed inference problem. 

Lemma 1: Let X, Y ~ p(x, y\8) and suppose there exists 
a random variable W such that 

0-W-(X,Y). (1) 

A statistic T(X, Y) that is sufficient for W is also sufficient 
for 8. 

Proof: The Markov chain ([]]) implies that 8 — W — 
(X, Y, T(X, Y)) forms a Markov chain for any statistics 
T(X,Y). That T(X,Y) is sufficient for W implies the 
Markov chain W - T(X, Y) - (X, Y) . It is straightforward to 
show that these two Markov chains give rise to a long Markov 
chain 

8- W-T(X,Y) - (X,Y). 



Therefore, T(X, Y) is sufficient for 8. ■ 
Lemma Q] is not useful in itself as T(X, Y) is a function of 

the global data which is not available in either of the nodes. 

Its use is main for establishing the following result. 

Theorem 1: Let X, Y ~ p(x, y\8) and suppose there exists 

a random variable W such that 6 - W - (X, Y). Let T(W) 

be a sufficient statistic for 8, i.e., 8 — T(W) — W. 

1) If a pair of statistics (T x (X),T y (Y)) are globally suf- 
ficient for T(W), they are globally sufficient for 8. 

2) If T(W) induces conditional independence between X 
and Y, and (T x (X),T y (Y)) are locally sufficient for 
T(W), then (T X (X), T y (Y)) are globally sufficient for 
8. 

Proof: To prove 1), from LemmaQ] we only need to show 
that the Markov chain 8-T(W) - (X, Y) holds. However, the 
Markov chain T(W)-(0, W)-(X, Y) together with 0-W- 
(X, Y) results in the Markov chain (8, T(W))- W- (X, Y). 
Combined with the Markov chain 8 — T(W) — W, we get 
8 - T(W) - W - (X, Y) which implies 8 - T(W) - (X, Y). 

For the second one, since conditional independence en- 
sures that locally sufficient statistics are globally sufficient, 
(T X (X), T y (Y)) are thus sufficient for T(W). The first result 
then establishes that they are also sufficient for 8. ■ 

Remark 1: It is given in [12] that any general distributed 
inference model can be represented as a HCI model and vice 
versa, where the HCI model is constructed by introducing a 
hidden variable W such that the following Markov chains 
hold: 8 - W - (X, Y) and X - W - Y. Therefore, Theorem 
[T] indicates that under the HCI model, local sufficiency with 
respect to the hidden variable implies global sufficiency. 

From the above result, it is clear that whether T x (yi) is 
globally sufficient depends also on T y ( Y) and vice versa. This 
coupling effect makes it rather difficult in studying the global 
sufficiency property. In the following, we consider a somewhat 
simplified situation where one is interested in data reduction 
at one node provided that a locally sufficient statistic from the 
other node is available at the fusion center. That is, if T y ( Y) 
is known to be a locally sufficient statistic, what should node 
X transmit such that T X (X) may form a globally sufficient 
statistic together with T y (Y). 

Theorem 2: Let X, Y be distributed according to p(x, y\8). 
Assume T y (Y) is a locally sufficient statistic for 8, then 
(T X (X),T^(Y)) are globally sufficient for 8 if and only if 
there exist functions g(ti\t2,0) and /i(x, y) such that, for 
all sample points (x, y) and all parameter values 8, the 
conditional probability p(x|y, 8) satisfies 

p(x|y,0) =ff(T B (x)|T„(y),0)/i(x,y). (2) 

Proof: Directly from the factorization theorem for 
(X, Y) and by rewriting p(x, y\8) = p(y\8)p(x.\y, 8). ■ 

Remark 2: Given a locally sufficient statistic T y {Y), it is 
possible that there does not exist a T x (X) forming a globally 
sufficient statistic together with T y (Y). 

Remark 3: The above result is shown under the assumption 
that 8 is a random variable, similar result can be obtained for 



9 is not random by resorting to factorization theorem instead 
of data processing inequality. 

Example 1: For i = 1, • ■ • , n, let 

X l = Z + Ui 
Yi = Z + V h 

where Z,Ui, - ■ ■ , U n , V\ , ■ ■ ■ V n are mutually independent 
Gaussian random variables such that Z ~ N(6,p), Ui ~ 
JV(0, 1 - p), Vi ~ JV(0, 1 - p). Thus, Xi,Yi ~ N{9, 9, 1, 1, p). 
The parameter of inference interest is 9. X and Y are not 
conditionally independent given 9. 

Let T(W) = W = Z. Thus, Z depends on 9 through its 
mean value. Clearly, Z satisfies the Markov chains 6 — Z — 
(X, Y) and X - Z - Y as required by the HCI model. Thus, 
from Theorem Q] the locally sufficient statistic pair for Z, 
Q2i 2~2i Yi), is globally sufficient for 6. 

Example 2: Consider the hypotheses test where the obser- 
vations Xi, i = 1, • • • , k, satisfy the following model 

H : Xi = Ni, 

H x : Xi = hiS + Ni, 

where h/s are complex Gaussian and independent of each 
other and of other variables, S is a QAM signal taking values 
in the set s m — r m e^ 8m with probability ix m where 9 m = mjj 
for m = 1, • • • , M, and N{ is the independent observation 
noise at the ith sensor with Ni ~ N(Q, a 2 ). The above model 
describes the problem of detecting the presence of a QAM 
signal in independent Rayleigh fading using k sensors, e.g., 
as in cooperative spectrum sensing. Each sensor makes a local 
decision Ui = j(Xi) and sends it to a fusion center which 
makes a final decision regarding the hypothesis under test. 

The observations are not conditionally independent given 
Hi. Let W = S which induces conditional independence 
among observations under both hypotheses. It is easy to see 
that T(W) = \S\is sufficient for H given S. Thus, the Markov 
chain H-\S\-S- (Xi, ■ ■ ■ ,X k ) holds. 

On the other hand, given the observations are con- 
ditionally independent of each other under the QAM and 
Rayleigh fading assumptions. For any i, \Xi\ is a minimal 
sufficient statistics for \S\. This can be easily verified by the 
ratio p^/jHj for two sample points Xi and x[. Therefore, by 
Theorem [ll {|Xj|} is globally sufficient for H. 

The above observation can be used to establish that the 
optimal detector at each local sensor is an energy detector for 
the model described in Example 2 [13]. 

III. Sufficiency for Tandem Network 

A tandem network, as illustrated in Fig. 12 is one such that 
compressed data are transmitted to a node which also has 
its own observation. The second node will then make a final 
decision using its own data and the input from the first node. 
Knowing that Y is available at the fusion center even without 
directly observing Y should have an impact on how node X 
summarizes its own data X. A natural way of extending the 



sufficiency principle to this network is as follows: the inference 
performance should remain the same whether the inference 
is based on (X, Y) or (T(X), Y). From the data processing 
inequality, the sufficiency of T(X) can thus be characterized 
using the Markov chain 9 - (T(X), Y) - (X, Y). Given that 
T(X) is a function X, it is straightforward to show that that 
the Markov chain 9 - (T(X), Y) - (X, Y) is equivalent to 
9 — (T(X), Y) — X. This motivates the following definition 
of conditional sufficiency. 

Definition 1: A statistic T(X) is a conditional sufficient 
statistic for 9, conditioned on Y, if the conditional distribution 
of the sample X given the value of T(X) and Y does not 
depend on 9. 

The definition allows us to generalize a number of classical 
results related to sufficient statistics. 

Theorem 3: Let X, Y be distributed according to p(x, y\9). 
Let g(T(x),y|0) be the joint distribution of T(X) and Y, then 
T(X) is a conditional sufficient statistic for 9, conditioned on 
Y, if for every (x, y) pair, the ratio ^pfej^U ' s constant as 
a function of 9. 

Similarly, the Neyman-Fisher factorization theorem can also 
be generalized to the conditional case. 

Theorem 4: Let X, Y be distributed according to p(x, y\9). 
A statistic T(X) is conditionally sufficient for 9, conditioned 
on Y, if and only if there exist functions g(t, y\9) and ft,(x, y) 
such that, 

p(x,y|0)= fl (T(x),y|0)fc(x,y), 

for all sample points (x, y) and all parameter values 9. 

The proof can be constructed similarly to that of the 
factorization theorem in [2, Theorem 6.2.6]. In fact, this result 
can be viewed as a special case of Theorem [2] using the fact 
that Y is naturally a locally sufficient statistic for Y. 

Remark 4: For tandem networks, the definition of condi- 
tional sufficiency is more general than global sufficiency. This 
is because if there exist a pair of statistics (T x (X.),T y (Y)) 
that are globally sufficient for 9, then X^X) must be condi- 
tionally sufficient for 9, conditioned on Y. Therefore, for the 
inference problem under the HCI model, one can also obtain 
a conditional sufficient statistic using Theorem [T] 

Similar to the definition of minimal sufficient statistic [2], 
we can define the notion of minimal conditional sufficient 
statistic as follows. 

Definition 2: A conditional sufficient statistic T(X) is a 
minimal conditional sufficient statistic if it is a function of 
any other conditional sufficient statistic C/(X). 

The following theorem provides a meaningful way to find 
minimal conditional sufficient statistics. 

Theorem 5: Let X, Y be distributed according to p(x, y\9). 
Suppose there exists a function T(x) such that for every two 
sample points x, x, and y, the ratio is constant as a 

function of 9 if and only if T(x) = T(x). Then T(X) is a 
minimal conditional sufficient statistic for 9 given Y. 

The proof follows the same line of proof for Theorem 6.2.13 
in [2]. 
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Fig. 3. Source coding with side information 

Example 3: Let {Xi, Yj,J, i = 1, • • • , n be independent and 
identically distributed (i.i.d) according to p(x,y\0), where 

, (2 < x < 8 + 1,0 < y < x, 

P(x,y\0) = [ o otherwise . 

The marginal distribution of X and Y are therefore, 

p{x\0) = 2{x-0), 0<x<0 + l, 
p{y\0) = 2(0 + l-y), 0<y<0 + l. 

It can be easily shown that no data reduction is possible 
using the marginal distribution, i.e., no meaningful locally suf- 
ficient statistics can be found other than the data themselves. 
Note that X is uniformly distributed on the interval (y, + 1), 
therefore, we have 

1 



p(x|y,0) = 
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-,yi < Xi, (max{xj - 1) < 0. 



Thus, maxi{Xj} is a conditional sufficient statistic for 0, 
conditioned on Y. Similarly, we can obtain that mirii{Fi} 
is a conditional sufficient statistic of Y, conditioned on 
the X sequence. This is consistent with the fact that 
(m&Xi{Xi}, mmi{Yi}) is globally sufficient given both X and 
Y. 

IV. Sufficiency and Distributed Source Coding 

For the point to point remote rate distortion problem, it was 
shown that sufficient statistic based data reduction achieves 
the same rate distortion function as the original data [14]. 
This section studies the connection between the sufficiency 
principle and distributed source coding problems. 

A. Source coding with side information 

Consider the lossless source coding problem in Fig. [3] 
An i.i.d. sequence of source pairs (X n ,Y n ) are encoded 
separately with rates (Ri,R 2 ) and the descriptions are sent 
to a decoder where only X n is to be recovered with asymp- 
totically vanishing probability of error. A rate pair (Ri,R 2 ) 
is achievable if there exists a lossless source code with rates 
(Ri,R 2 ). The rate region TZ is defined as the closure of the 
set of all achievable rate pairs and was shown to be [15], [16], 



Assume T(Y) is a sufficient statistic for X, i.e., X —T(Y) - 
Y. Define 

W = {(Ri,R 2 ) : Rt >H(X\U),R 2 > I(T(Y);U), 
X-T(Y)~U}, 



which is the rate region for encoding (X n ,T n (Y n )) where 
T n (Y n ) is the i.i.d sequence T(Y t ),i = 1, ■ ■ ■ ,n. The 
following theorem shows that encoding reduced data T n {Y n ) 
achieves the same rate region as encoding the original data. 
Theorem 6: 

11 = 11' 

Proof: It is straightforward to show 1Z D TV. To show 
TZ C 11', let (Rx,R 2 ) £ 1Z, then there exists a U such 
that X — Y — U, Ri > H(X\U),R 2 > I(Y;U). Since 
(X, T(Y)) - Y - U and X - T(Y) - Y, the Markov chain 
X - T(Y) - Y -U holds. Therefore, R x > H(X\U), R 2 > 
I(Y;U) > I(T(Y);U) by the data processing inequality. 
Thus, (Ri,R2) eTZ'. M 
A direct consequence of Theorem [6] is that the corner point 
of the rate region (R x = H(X\Y),R 2 = H(T(Y)) may 
be strictly smaller than {R 1 = H(X\Y),R 2 = H(Y). This 
observation was first reported in [17]. Specifically, the corner 
point can be obtained by finding the smallest admissible R 2 
when R\ = H(X\Y) and it was shown that [17] 



inf{i? 2 : (H(X\Y),R 2 ) G TZ} 



mi I(Y;U), 

X-Y-U,X-U-Y 

= H^y)- 

As it turns out, the quantity $y is precisely the minimal 
sufficient statistic of X given Y. 

B. Remote source coding with side information 

Consider a model in Fig |U which is the remote source 
coding with side information available at both the encoder and 
decoder. We will show that in this problem, the rate distortion 
function will not change by encoding a conditional sufficient 
statistic T(X). 

Let (X, Y, Z) ~ p(x, y, z) and d(z, z) be a given dis- 
tortion function. Let (X n ,Y n , Z n ) be i.i.d sequences drawn 
from (X,Y,Z). Upon receiving the sequences (X n ,Y n ), the 
encoder generates a description of the sources with rate R 
and sends it to the decoder who has the side information 
Y n and wishes to reproduce Z n with distortion D. The rate 
distortion function R(D) is the infimum of rate R such that 
there exist maps /„ : X n x y n -4 {l,--- ,2 nR }, g n : 
y n x {1, • • • , 2 nR } -> Z n such that 

limsup Ed(Z n ,g n (Y n J n (X n ,Y n ))) < D. 

n— >oo 

It is easy to show that the rate distortion function R(D) is: 
R(D)= min minI(X;U\Y), 

p(u\x,y) f 



TZ = {(Ri,R 2 ) : Ri > H(X\U), R 2 > I(Y; U),X-Y-U}. i = 



where the minimum is taken over all p(u\x, y) and functions 
z = f(u,y) such that 



E x [d{Z,Z)\ = 



x,y,z,u 



P(x, y, z)p(u\x, y)d(z, f(u, y)) < D. (3) 



Let T(X) be a conditional sufficient statistic for the remote 
source Z, conditioned on Y (i.e., Z - (T(X),Y) - (X, Y)). 
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Fig. 4. Remote source coding with side information. 



Define 

R'{D)= min mmI(T{X); U\Y), 

p(u\t,y) f 

where the minimum is taken over all p(u\t, y) and functions 
z = f{u,y) such that 

E 2 [d(Zj)} = p(t,y,z)p(u\t,y)d(z,f(u,y))<D. (4) 

t,y,z,u 

R'(D) is the rate distortion function when we have 
(T n (X n ),Y n ) instead of (X n ,Y n ) at the encoder, where 
T n (X n ) is the i.i.d sequence T(X;), i = 1, • • ■ , n. 
Theorem 7: 

R(D) = R'(D). 

Proof: It is obvious that R(D) < R'(D). 
We now show R(D) > R'{D). For any U that achieves 
R(D), since T(X) is a function of X, we have the Markov 
chain (T(X),Y) - (X,Y) - U, hence 

I(X;U\Y) > I(T(X);U\Y). 

Given that T(X) is a conditional sufficient statistic for Z, 
we have the following 

D > E 1 [d(Z,Z)} 

= ^2<l(z, f(u,y))i ^2p(z\x,y)p{x,y,u)\ 

y,z,u \ x J 

= ^d{z,f{u,y))y^p{z\t 1 y) ^ p(x, y, u)\ (5) 



^2d(z,f(u,y))l ^2,p{z\t,y)p(t,y,u)\ 



(6) 



where © comes from the definition of conditional sufficiency 
and © is true by defining p(t,y,u) = J2x:T(x)=tP( x ^y^ u )- 
This shows that for any p(u\x,y) and f(u,y) satisfying (01 
there exist p(u\t, y) and f(u, y) such that © is satisfied. Thus, 
R(D) > R'(D). M 

V. Conclusion 

This paper developed the sufficiency principle that guides 
local data reduction in networked inference with dependent 
observations for two classes of inference networks: parallel 
network and tandem network. 

For the parallel network, a previously proposed hierarchical 
conditional independence model is used to obtain conditions 



such that local sufficiency implies global sufficiency. A co- 
operative spectrum sensing example is given to illustrate the 
usefulness of such an approach. For the tandem network, we 
introduced the notion of conditional sufficiency and developed 
related theories and tools. 

The sufficiency principle for networked inference has appli- 
cations beyond that of decentralized inference. In particular, 
data reduction using suitable notions of sufficiency appears 
to incur no penalty on the rate region for various distributed 
source coding problem. There are potentially other distributed 
source coding problems where sufficiency based data reduction 
may also prove to be optimal. 
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